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Abstract 

Transcription factors (TFs) are key regulators of gene expression. Based 
on the classical scenario in which the TF search process switches between 
one-dimensional motion along the DNA molecule and free Brownian mo- 
tion in the nucleus, we study the arrival time of several TFs to multiple 
binding sites and derive, in the presence of competitive binding ligands, 
the probability that several target sites are bound. We then apply our 
results to the hunchback regulation by bicoid in the fly embryo and we 
propose a general mechanism that allows cells to read a morphogenetic 
gradient and specialize according to their position in the embryo. 

Keywords: modeling, stochastic binding, diffusion of transcription factor, 
gene activation, morphogenetic gradient, cell differentiation. 

1 Introduction 

Transcription factors (TFs) are key regulators that can initiate or inhibit gene 
activation by binding to specific DNA sites. TFs enter the cell nucleus and 
search for their specific binding sites on the DNA molecule. In a context of 
competition for activation and inhibition, the search for target sites should not 
be too long otherwise, another gene might be activated. This is the case for 
olfactory gene activation [1] , where a single G-coupled receptor out of hundreds 
is expressed in a single cell, while all other receptors are repressed. Thus cells 
might use various mechanisms to control gene activation including changing lo- 
cal properties of the DNA molecule (for example by methylation or acetylation) 
or changing the properties of the TF interaction as what happens when specific 
molecules bind to TFs and modify their affinity for the DNA molecule. 

The mean time to reach a target site is thus a fundamental parameter of 
gene activation and several biophysical scenarios have been proposed to esti- 
mate it. Berg and Von Hippel Pl 13111] realized that the search time cannot be 
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computed using a three dimensional random walk only, because the observed 
search time is indeed shorter. Using the property that the TF scans the DNA 
base pairs for potential binding sites, they proposed that the search consists 
of consecutive cycles of three dimensional free diffusion in the nucleus and one 
dimensional random motion along the DNA molecule. DNA base pairs are not 
electrically charged and so no long distance interactions are involved, thus, the 
TF should physically come close to the DNA in order to generate a true interac- 
tion. During the one dimensional search, the TF is confined to a neighborhood 
of the DNA molecule and can detach due to thermal fluctuations and a new 
ld-3d cycle resumes until the target is eventually reached. This basic scenario 
has been confirmed experimentally by single particle tracking experiments [21] 
and investigated theoretically, by accounting for the local base pair interactions, 
the mean number to scan the base pairs per cycle, the free diffusion time and 
the time the TF is bound to the DNA molecule [5| [51 [71 [51 [TU] . 

However, it is still unclear how to go from the TF search time to the mech- 
anism responsible for cell specialization. Cells in a living tissue are imbedded 
in a matrix of positional information, generated by morphogenetic gradients 
[ITJ [H [H [H [T5]. A first step consists of the ability of the cell to "read out" 
the local characteristics of the morphogenetic gradient so that the cell can be 
labeled and acquire its own identity. To address this question, here we inves- 
tigate how several binding sites can be bound at steady state by several TFs 
generated by an external steady state gradient. In particular, we are led to 
compute the mean time for some TFs to bind to many target sites. The number 
of bound sites can be seen as a digital-converter used by the cell to discriminate 
two different morphogenetic gradient concentrations depending whether or not 
all the binding sites are saturated. We further study the effect of competitive 
ligands that can modulate the TF's action leading to gene repression. 

First, we present our computation for the time for a single TF to bind to 
one of multiple potential binding sites. The distribution of the arrival time is 
always a sum of two exponentials but, for a large number of free diffusion and 
DNA binding cycles, the arrival time is single exponentially distributed. We 
then expand our analysis to multiple TFs with multiple targets in the presence 
of enzymes leading to degradation. We apply our results to estimate the number 
of active sites when the cell nucleus experiences a steady state TF influx. Finally, 
we estimate the steady state probability that a given number of binding sites 
are occupied. In our model, this probability describes the proportion of time a 
gene is actively transcribed. We apply our analysis to the initial patterning of 
the fly embryo by the bicoid (bed) morphogenetic gradient. The bed gradient 
regulates a number of downstream TFs involved in the gap gene network [161 [17], 
which determines the position of body sections along the anterior-posterior (A- 
P) axis in the drosophila embryo. Among these gap genes, hunchback (hb) 
is responsible for thoracic development [1^1 [TT]. Hb activation leads to the 
formation of a sharp boundary and to the formation of stripes. We use our 
analysis of TF binding to determine the hb density induced by bed activation, 
and we show that this hb-bcd interaction-modulation is sufficient to generate 
the transition from a smooth bicoid gradient into a sharp hb boundary in the 
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middle of the drosophila embryo. Our approach provides a general scenario at 
a molecular level of TF interactions that lead to cell specialization. 



2 Distribution and mean of the search time 

We first summarize the properties associated with the TF's search process to its 
binding site. The TF switches between a free diffusion and random walk along 
the DNA molecule [1 13 H HU]. 

1. Due to the interaction potential with the DNA backbone TF, the TF can 
bind unspecifically to the DNA molecule. The strength of the interaction 
potential is around IQkT [121 [S], larger than the thermal noise ~ kT. In 
this deep well approximation, the random time a TF stays bound is 
exponentially distributed |20| . Experimental and theoretical estimates for 
the average time are in the range of a few milliseconds [21] [10] . 

2. A bound TF slides along the DNA molecule during a random time 
scanning an average number n{T^) of base pairs (bp). The mean number 
n = ET-^(n(Td)) of base pairs scanned before detaching is on the order of 
100 [1I1[IU]. 

3. A TF can detach from the DNA due to thermal noise and diffuse freely 
in the nucleus until it rebinds to the DNA. When the DNA molecule oc- 
cupies a small fraction of the nucleus and can be approximated as long 
rods, the random time r/ a TF spends diffusing in the nucleus is expo- 
nentially distributed [TU] with an average r/, which is on the order of a 
few milliseconds [21] [10]. However, for larger density and a complex DNA 
organization, the distribution time in general is a sum of exponentials and 
might even be more complicated. 

We start with Uf copies of a TF, alternating independently between periods 
of free diffusion and random walks along the DNA until one of the Ug binding 
sites is found. We further consider competitive ligands that can bind to the TF 
target sites, preventing the sites to be occupied by TFs. The ligand L binds to 
the target site S according to a first order reaction: 



with an association and a dissociation rate ka and kd respectively. At equilib- 
rium, Michaelis-Menten reaction says that for a concentration C of ligands, the 
probability that a binding site is not occupied is: 



S + lks.L 



(1) 



1 



(2) 



P 
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Search time for a single TF 



The random time r(l, Ug) for a single TF to bind to a target site is the sum of 
the time spent in one and three dimensions. Using the characteristic function of 
the search time T(l,ns), we can express the probabihty density function (pdf) 

PTW = ^Pr{r(l,n,)<i} (3) 
as follows (see appendix): 

PT{t) = + , (4) 

f 2 - ri ri ri — r2 r2 

where ri and r2 are the two positive roots of (1 — xTd) (1 — xTf) — 1 +p{ns) — 
and p{ns) is the probability to find a target during a single one dimensional 
walk along the DNA. The associated mean binding time is: 

oo 

r(l,n.) = / tpTm - ^ + ^. (5) 

J ri(r2-ri) r2{ri ~ r2) 



In the limit p{ns) <C 1, using the expression for the two roots and approximating 
the pdf pt (eq.Hl) by a single exponential for a time t such that ^=-^ + =j^t^l 
(see appendix), we obtain that 



PT{t) ^ J' _ e ^3+^ . (6) 

Td+Tf 



Since and t f are both on the order of a few ms [TOl [21] , the single exponential 
limit is valid for t larger than a few ms. The mean time T(l,ns) then reduces 
to 

T(l,n,)«.I^. (7) 

The mean number of free diffusions and DNA bindings before finding the target 
site is equal to The limit p{ns) ^ 1 corresponds to TFs that find their 

target sites after a large number of cycles. 



Search time for multiple TFs 

When there are n f TFs that can potentially bind to Us identical binding sites (a 
site can only be occupied by a single TF), in the single exponential limit, the time 
T{nf,ns) for the first TF to bind a site is the minimum of the Uf exponential 
laws of mean time T(l,ns). T{nf,ns) is then exponentially distributed with 
mean: 

T{nf,ns) = . (8) 
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We shall consider rig well separated sites (by at least a distance of n base pairs) . 
In this case, the probability to find each site during a DNA biding is -^P- where 

^* bp 

Nbp is the total number of base pairs in the genome. Furthermore, in the 
presence of competitive ligands, there are Pus available binding sites. Thus, 
the probability of binding to one of the sites is p{ns) = Pugj^ and the 
mean binding time for well separated sites with p{ns) <C 1 is: 

Tinj,ns) « Zl+lL^ill+lA (9) 

Ufpiris) UfPUsTL 



Ts 



n 



(10) 



where 



Tj, {Td + Tf)Nbp . . 

= P^ 

is the search time for a single TF with a single target site. 
Remark 1. Formula ([T0| describes the combined effect of multiple but well 
separated binding sites. When the sites are clustered, the mean time to find a 
target becomes a nonlinear function of the distribution [23l HH [25l [26l [27] and 
has been approximated by the Berg-Purcell approximation formula |28) . When 
there are Ug binding sites of size a, located on an ensemble of DNA-molecules 
on a sphere of radius R, the mean time Td in 3d to find a site is: 

Dh \^nR Ansa J ^ ' 

This formula can be improved |29l |30] . Here Dh is an effective diffusion constant 
that accounts for the switch between the ID DNA motion and the 3D diffusion. 
When the one ID excursion length is small compared to the 3D diffusion length, 

In the other cases, one has to deal with random jumps. 

Remark 2 For P = \ (no competitive ligand), n « 100 [HI [5l [10] and for a 
relatively small genome A^hp = 10^, pijis) is approximated by: 

p(n,) « n,10-^ (14) 

Thus pijis) ^ 1 is valid as long as the number of binding sites satisfies Ug ^ 10^. 
We conclude that pijis) ^ 1 is verified in most cases. 



3 From a morphogenetic gradient to DNA site 
activation 

We shall now apply our previous results to estimate the number of occupied 
sites when a nucleus receives a steady influx of TFs. This steady influx of TFs 
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entering the nucleus could, for example, either be imported from outside the 
cell or be steadily produced in the cytoplasm of the cell. We consider that 
gene expression is proportional to the mean time the binding sites are occupied. 
Since the number of binding sites occupied controls gene expression, we shall 
estimate, for a given TF influx, the mean proportion of time the binding sites 
are occupied. 

3.1 Activation of a single binding site 

We first compute the average occupation ratio Pi of a single binding site before 
considering multiple sites in the following section. To compute Pi, we use Bayes' 
law and sum over the number of TFs in the nucleus: 



where P{nf) is the probability of having rif TFs in the nucleus and P(l|n/) is 
the conditional probability that a single binding site is occupied when there are 
71/ TFs. To proceed with the computation of Pi, we assume that TFs arrive in 
the nucleus at a Poissonnian rate A and are degraded (free or bound) by enzymes 
at a rate K. Thus, the number of TFs in the nucleus follows a birth and death 
process and is distributed according to a Poisson law with mean a — 



We now compute P(l|n/). When a TF has found the target, it stays attached 
for a mean time Tb- We consider that the rate of binding and unbinding to the 
sites is faster than the rate of TF turn over in the nucleus and that the steady 
state between binding and unbinding is reached, thus 




(15) 



nnj) = —re 



(16) 



P(l|n/) = 



n Tb 



(17) 



where /3 



Using equations [TBI and ((T7]), we get: 



Pi 




(18) 
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where x = au. We plot in figure [l}i the occupation ratio Pi as a function of 
the average number a of TFs in the nucleus for different values of /3. When the 
competitor ligand concentration C varies, the occupation ratio is modulated 
as described in figure [Hd. Using Lac I data [H] and in the absence of DNA 
binding competitor (P = 1), the total search time is T5 = 6min [211 [10], while 
T{, K, 70min [31] and thus the ratio is/3 = =^«l/ll. We conclude (red curve 
Fig[T^) that for low /?, the target site can be occupied for a significant proportion 
of time. In particular, small fluxes of TFs can induce significant modifications 
on gene expression in a target cell. 



3.2 Activation with multiple binding sites 

When there are binding sites, we shall now compute the proportion of time 
Pfe that k sites are occupied. Using Bayes' law, we have: 

00 

Pfc = P(/c|n/)P(ny), (20) 

where P(7t/) is the probability to have nj TFs given by expression (|16p. We 
now compute P(fc|n/) by analyzing a Markov chain :33 which describes the 
probability f'q{t) that q sites are occupied at time t. 

When q sites are occupied, the total release rate is =- while the arrival rate 

to a site is given by T {nf ~ q, Us — q) = ("z^^^"""*?) with equation (|lll) . The 
forward and backward rate of the Markov chain are given by: 

F, = - - (21) 

Ts 

B, = (22) 

b 
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(a) Pi as a function of a 



a 



C in |jMoI 

(b) Pi as a function of the competitor concen- 
tration C 



Figure 1: (a) Pi as a function of a for various values of /?. From left to 
right, /3 increases 1/10 (red), 1/3 (green), 1 (yellow), 3 (blue). The upper curves 
correspond to fast search times and/or long binding times to the target site and 
no competitors, (b) Pi as a function of the competitor concentration C 
in /xMol. The upper curve is obtained for a = 5, the lower one is for a = 2, 



where ^ = /3o (l + Cf^j with /3o = 15 for C = and |^ = 20AiMori 



and the Markov chain is given by |33| : 

^¥{q, t\nf) = ~{Fg + Bg)Piq, t\nf) + F,^i¥iq - 1, t\nf) + Bg+i¥{q + 1, t\nf), (23) 
with the boundary conditions: 



d 
dt 

d 



-P{nf,t\nf) - F„,_iP(n/-l,t|n/)-B„,P(n/,i|n/) (24) 



-P(0,i|n/) - -Fo¥iO,t\nf) + BiV{l,t\nf). (25) 

We consider that the rate of binding and unbinding to the sites is faster than 
the rate of TF turn over in the nucleus and that the steady state is achieved 
quickly, thus: 

= -{Fy + B^)Fiq\nf) + ^,_iP(g - l\nf) + B,+iP(<z + l\nf), (26) 

where ¥(q\nf) — ¥{q,oo\nf). By induction ,33] . for k < n' ^ niin(ny,ns) (the 
maximal number of sites occupied by TFs), we get: 

^ fc-i 

F{k\nf)=P{0\nf)-— l[{nf~j){ns~j), (27) 



where: 



/3=^. (28) 



For k > n' ~ min(n/,ris), V{k\nf) = since there can be no more than n' TFs 
bound. Using the normalization condition, 



J2mnf)^l, (29) 



fe=0 



we finally get for 1 < /c < n': 



gk\ n ("/ - j)K - i) 



and for fc = 0: 



mnj) - ^r^^Vl ■ (30) 



P(OM = . (31) 

j=i ■ i=0 
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Using expressions (|20l) and (l?71) . we obtain for 1 < fc < risi 

fe-i 

oo ^n("/-i)K-i) 



and for k = 0: 



= e " + V ^e-" ^ ^ , (33) 

1+ E ^n("/-j)K-j) 



and Pfc = for fc > as there can not be more than TFs bound to the 
target sites. We shall now derive asymptotic expressions for when a <^ 1 
and /3 ^ 1, which correspond respectively to a small average number of TFs in 
the nucleus and to TFs that stay bound to the targets a long time compared to 
the search time. 

Asymptotics for a small 

With the expression of P("./) given in (fTH]) and the summation (|20p . only the 
terms nj = 0, 1 contribute to the first order asymptotic in a ^ 1. With (|27p 
we obtain: 

77 

P(l|l) = -^P(0|nl) (34) 
and with P(l|l) + P(0|1) « 1, 

11 

P(l|l) 



P(0|1) = 1-P(1|1). 

With (I20|), for a < 1, 

Pi « ae"" ^« 35) 

We conclude that the probability that one site is occupied is given by the average 
number of TFs a multiplied by the probability ^"^^ to have one site occupied 
when there is one TF in the nucleus. 

Asymptotic for j3 small 

We now compute the asymptotic for /3 1. 
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1. When the number of TFs is larger than the number of available sites 
(n/ > Us), using equation (pO|. for /3 <C 1, only the terms P(ns — l|n/) and 
¥{ns\nf) contribute to the first order for /3 ^ 1. Using the normalization 
relation 

F{ns - l\nf) +P{ns\nf) « 1. 
Furthermore with ([50)) . 

Bn 

Tif — Tig + 1 

we then obtain: 

"•(".-II"/) - T^j^y m 

We ignore all other probabilities in the first order for /3 ^ 1. When 
> and /? 1 almost all sites are occupied. 

2. When there are less TFs than the number of available sites (0 < ny < rig), 
then for /? ^ 1 only P(fc = n/ — l|n/) and P(fc — nf\nf) have a contribution 
in the leading order of equation (|30|) . We obtain: 



We neglect all other probabilities in the first order for /? ^ 1. 

Combining equations (I16p . (|20p and the first order approximations in /3, the 
probability P„^, that all sites are simultaneously occupied is: 

f3ns \ d^s 



Using the partial sum: 

ris — l ^ 



fc! 

k=Q 



and after some computation (see appendix) we can write: 

1 

P„, (a) = 1 - e-"5(a) - /Jn.e"" / du. (42) 
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For /3 ^ 1, is an increasing function of a and a decreasing function of 
/3 (see appendix). Increasing the number a of TFs leads to an increase in 
the probability that all sites are occupied, while increasing f3 decreases the 
probability that all sites are occupied. 

Similarly (see appendix), with equations ([^ . and (|^ the asymptotic 
expression of the occupation ratio Pk for /3 ^ 1 is given by: 

' 1 -^f ^ R - /■ - Sjau) ^ 
1 — e b{a) — piisC I du 



(n. - 1)! V 2 J ' i 

We plot in figure [5] (resp. ^ the function P^^ as a function of a for rig ~ 2 (resp. 
= 4). 



for k = Ug 



for /c = 71,5 
for k < Ug 



3.3 Consequence of the analysis for gene expression sta- 
bility 

We now use our results on the occupation ratios Pfc to show that at least two 
binding sites are required for the regulation of a genetic switch. A genetic 
switch is a special type of autoregulated gene. These autoregulated genes code 
for TFs that regulate the transcription of their own gene. A genetic switch is a 
process allowing two stable values for the transcription rate of a gene [34] : an 
"on" position where the gene is transcribed and an "off' position where it is 
not transcribed. If the gene is "on", transcription is maintained at high levels 
through autoregulation. If the gene is "off', transcription remains at low levels 
and does not turn on without an external signal. Genetic switches play a central 
role in cellular differentiation, memory and plasticity [35l |36] . 

We shall show that a bistable genetic switch can only appear when there 
are at least two binding sites regulating the transcription of the gene. We 
first determine the steady state concentration of the TF due the balance of 
production and degradation by enzymes. For a gene transcribed at a rate r 
when there are Non occupied binding sites, the steady state production A of 
TFs verifies: 

A = rPAr„„ = r/(a), (44) 

where /(a) — PAr^^(a) is given by formula (l43t and depends on Non and Ug. 
The steady state value a. = satisfies the nonlinear equation: 

Rf{a) = a, (45) 

where R = ^. Bistability appears when equation (|45p has two stable solutions, 
thus equation (j45p must have three solutions (two stable and one unstable in 
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(a) Probability for Us = 2 and 13 = (b) Probability for Us = 2 and /3 = 1 



Figure 2: for rig = 2. (a) as a function of a for /3 = j^^. P^. is computed 
through approximation (|43p . For a TF activating its own transcription when 
both sites are simultaneously occupied, the two stable values for a (high and 
low values for a) and the unstable value (in the middle) are represented along 
the dotted line, (b) The impact of a change in /3. Curves in red and green 
are for /3 = 1/10, the curve in blue is for (3 = 1. For |^ = 20/iMol-i [S] 
and f3 — when C — 0, this corresponds to a ligand concentration of 0/xMol 
(red and green) and 0.5 /iMol in blue. The curve in red is computed through 
through approximation ()43|) . Curves in blue and green are computed through 
finite sums of (l32|) (200 terms). 
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a a 

(a) Probability Pfc for = 4 and f3 = (b) Probability Pj. for = 4 and (3 = 1 



Figure 3: Pfc for ris = 4 as a function of a for (a) f3 = 1/10 and (b) /? = 1. 

Fk is computed througli finite sums of ((32)) (200 first terms). With P — jq when 
C = and = 20/iMol~^ [3T], this corresponds to a ligand concentration of 
O^iMol (left) and 0.5 ^Mol (right). 

between) . The number of solutions depends on the parameters iVo„ and R: For 
= 0, as plotted in figures [2] and |3l /(a) = a/R has only one solution. 
For Non = 1, f{oi) = a/R has one solution for R small and two solutions for 
R large. For Non > 2, using formula (|43| and as plotted in figures [2] and [3l 
/ is a sigmoid type function. For R sufficiently large, equation (|45t has three 
solutions (figure[2]) and two of them are stable. A gene following such activation 
properties is a bistable switch. Conversely, for R sufficiently small, a « is the 
only stable solution. The critical value of R can be characterized geometrically, 
as the point where a/R is tangent to f{a). For this critical value there is a 
stable point at the origin and a saddle point at the tangent point. To conclude, 
a bistable switch requires at least two binding sites regulating the gene and the 
parameter R must be sufficiently large. 

4 Formation of the Hunchback boundary by the 
Bicoid gradient 

We shall now apply our analysis to determine the formation of the Hunchback 
TF (hb) boundary by the Bicoid (bed) morphogen gradient in the drosophila 
embryo. The bed gradient regulates a number of downstream TFs involved in 
the gap gene network [16[ I17j , which determine the position of body sections 
along the anterior-posterior (A-P) axis in the drosophila embryo. Among these 
gap genes, hb is responsible for thoracic development UHl E]. Given a bed 



14 



gradient, we propose to determine the spatial distribution of hb. Our analysis 
shows how a broad bed gradient can trigger a sharp transition in the hb density 
in the middle of the embryo. We reproduce the bed and hb density measured 
in vivo [in] in figure H^. To distinguish the values of a and /3 for the hb and 
bed TFs required in our previous model, we shall use subscript h for the hb TF 
and f, for the bed TF. We approximate bed gradient as exponential [IB] : 

ab(x) = Be-'=^ (46) 

where x G [0, 1] is the normalized A-P position (x — X/ L where L is the length 
of the drosophila embryo). We use k — 5.5, corresponding to the best fit for 
the in vivo data |16| . The constant B cannot be obtained directly from in vivo 
data. However, since 



-k{x~'-^)^ (47) 



changing the value of B is equivalent to an x-translation of the hb and bed den- 
sities. We choose B such as the hb boundary is in the middle of the drosophila 
embryo (see figure |3)d). 

hb transcription results from the binding of the hb TF and the bed TFs to a 
promoter with 6 bed binding sites and 2 hb sites [37l [38l [39] . Hb is transcribed at 
a rate r when there are two hb or at least one bed bound to the sites, described 
as 

0At least 1 bed bound , , 
> hb 



rate r 
fi, 2hb bound 



>hb. 



The hb density is proportional to the steady state production of hb given by 
A = r(l — P), where 

P = Po,6(1-P2./.) (48) 

is the probability that hb is not transcribed, Po.b is the probability that no bed 
are bound to the promoter and 1 — ¥2.11 the probability that there are not two 
hb bound. At equilibrium, using a/i = , we obtain the steady state equation 

tth = ^ =i?(l-Po.6(l-P2,h)), (49) 

where R = ^ and K is the degradation constant for hb. Equation (|49| is implicit 
for the mean number ah of hb, that we shall now compute. We will now evaluate 
separately expressions Po.b and V2,h- Along the A-P axis parameterized by the 
position X, Po,6 depends on the mean number ai,{x) of bed TFs and on the ratio 
/3f, = =^ of the search time of bed over the binding time. To evaluate (3b, we 
use the binding reaction of a bed to its target site S: 

S + bcd^ S.bcd, (50) 
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where S.bcd is the bed TF bound to its target site. The equihbrium constant 

[S.bcd] 
[S] [bed] 



Kd = /fi'r^'^ ji is the ratio of the forward to the backward rate of (15(1)) . equiva- 



lently: 



For Kd = 0.24nM [40], a nucleus of volume V « lfj,m^ and with Na the Avo- 
gadro number we obtain Pb — KdNaV. ~ 0.14. 

To compute Po,&, we use formula (|43p with fc = 0, = 6 and obtain: 

(52) 



We shall now evaluate the probability 1 — P2,h- In the absence of any precise 
data on the dissociation constant of hb from its binding site, we consider that 
binding is fast enough so that w 0. Using expression (|43p for the probability 
^2,h with k = Us = 2, we obtain 

1 - P2,/i = e""'^ + a/.e-"^ (53) 

Finally, at steady state, the equilibrium condition ([^^ reads: 

r(i- e-"" (l + ^) (e-"-^ + a^e-"'^)) = a,. (54) 



We solve equation ([M)) numerically (with Maple) to express ah as a function of 
ab- We plot in figure |4^-b several solutions associated with different values of R 
and B. As pointed out in equation and plotted in figure S^, changing the 
value of B is equivalent to a x-translation of the hb and bed densities. To further 
study the different types of solutions, we will vary the parameter R. Following 
the discussion in section 13.31 on bistability, for Non = n,, = 2 the dynamics for 
hb can potentially be bistable. We show now that for Ug = 2 and i? < 3, hb 
is always monostable. To compute the critical value Rc after which bistability 
occurs, we shall use the functions: 

P{x) = l-Po,6 (55) 
f{ah) = F2,h, (56) 

where P{x) depends on x through ab (06]). The function f{ah) is the fraction 
of time hb is autoactivated by the hb and P{x) is the fraction of time the gene 
is activated by the bed gradient. Equation (|54|) can then be rewritten as: 



R{l-{l-P{x)){l- f{aH)))^ah. (57) 

We determine in the appendix the critical value for bistability given by Rc — 3. 
For R < Rc the gene is always monostable, while for R> Rc the gene is bistable 
for some values of P{x) and monostable for others: 
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• For i? > 3, hb is monostable for x < Xc and bistable for x > Xc where Xc 
is a critical position. We represent the bifurcation diagram in figure |4ji. 
Changing B is equivalent to an x-translation in the hb profile and thus 
B can be adjusted such as the bifurcation point is Xc = 0.5 for example. 
If at time t = Q there is no hb, the hb density converges to the lower 
stable value, as represented in figure |3)d. Nevertheless, for a bistable hb 
dynamic, cells located inx > Xc can switch from the low to the high stable 
value for a sufficient perturbation. In the absence of a repressor of hb on 
the posterior side of the embryo, these cells would stay in the high stable 
state. 

• For i? < 3, hb is always monostable. When R becomes close to 3, there is 
already a boundary in the hb density (figure This boundary can be 
characterized by the point where f{ah) changes concavity and becomes 
tangent to a linear function (figure [5]). At the point of concavity change, 
a small variation in P{x) induces a large variation in at which produces 
a sharp transition in the hb density. 

We conclude that for an auto-regulated hb gene, when the bifurcation parameter 
R is close but smaller than the critical value Rc, there is a sharp boundary of hb 
in the embryo and this boundary does not require a repressor in the posterior 
half of the embryo. As shown in figure [Sj at the boundary hb synthesis is 
essentially due to autoactivation of hb (the activation P{x) due to bed is « 10% 
whereas the gene is autoactivated « 40% of the time). To obtain a numerical 
estimation of R, we use the synthesis rate r generated by two hb bound to 
the target sites and the degradation rate K of hb. Using the values from the 
supplementary material of '37', r « 19 and K w 7.08 and we obtain 

R « 2.7. (58) 

For R = 2.7, we observe a steep transition of the hb density at the middle of 
the embryo as in the in vivo data from [16 reproduced in figure [B^. The main 
difference between the theoretical density (figure HJd) and the in vivo data from 
|16j (figure |B^) is in the anterior edge where our model leads to an increase of 
the hb density instead of a decay as observed in vivo. This decay in the hb 
density at the anterior edge of the embryo is due to a repressive effect induced 
by the huckebein TF (hkb) [17] which we did not model in (|54|) and we shall 
examine now. 

Refining the gradient using hkb repressor 

We now account for the repression induced by hkb and consider that the tran- 
scription of the hb gene is repressed when at least one hkb is bound to the 
promoter site. Similarly to the analysis that lead us to equation (|49|). we ob- 
tain: 

ah = RFo.hkbil - Po,b(l - P2,h))- (59) 
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(a) ah{x) for difTcront values of B (b) ah{x) for different values of R 




(c) — Pq ~ 1^2 h)) ~ f*h a-s a function of (d) Bifurcation diagram of 

ah 



Figure 4: (a) Hb concentration ah{x) for different values of B: B = 

2.35/2 (blue), B = 2.35 (green) and B = 2.35 * 2 (red). Here, we use R = 2.7. 
All curves for an where scaled by a factor 25 to obtain the same numerical values 
as the the concentration in arbitrary units for in vivo data reproduced in figure 
int- (b) ah{x) for different values oi R: R = 2.5 (monostable), R = 2.7 
(monostable, value from |37]), R = 3 (critical value for bistability) and R = 3.5 
(bistable). For R = 3.5, there are two stable points: the high (dotted lines) and 
the low (continue line) stable value. B in was adjusted for each of the curves 
to cut 25 in a; = 0.5: B = 3.3 for R = 2.5, B = 2.35 for R = 2.7, B ^ 1.58 for 
i?, = 3 and B = 0.95 for R = 3.5. (c) i?(l - Po,fc(a6(a;))(l - P2,/,(a,0)) - ah as 
a function of ah for R — 3.5. The curves are for x = 0.4, 0.5 and 0.6. We use 
B = 0.95 as in figure|4}D. (d) Bifurcation diagram of ah{x). This bifurcation 
diagram is given by the solutions of R{1 — Po,b(ab(a;))(l — f2,h{ah))) — ah — 
as a function of x. We use B — 0.95 and R — 3.5 as in figure Ht. 
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(a) Boundary in the density of the autoregu- 
lated TF. 

Figure 5: Boundary in the density of the autoregulated TF. Are rep- 
resentd an/R (blue) and the proportion of time 1 — Po,&(l — ^2,h{ah)) the hb 
gene is active for P{x) = 1 — Po,6 = (blue), 0.1 (red) and 0.2 (yellow). The 
curves are all for the critical value i? = 3 to amplify the boundary in hb. We 
use B = 1.58 as in figure|4]D. The boundary comes from 1 — Po,fc(l — ^2.h{ah)) 
which is tangent to an/R at the point where f2,h[cth) changes concavity. A 
small variation in P{x) then induces a large variation in Uh- 
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where fo^tkb is the probabihty that no hkb are bound. We assume hkb binds to 
its target fast enough and shah consider that (3hkb — 0. Finally, Po.hfcb is then 
given by: 

Po,hfcb = e-""^ (60) 

To evaluate the distribution ahkb we fit the measured hkb distribution '17' with 
an exponential function: 

ahkb = Ce'="''^ (61) 

where khkb = 11.3 (see figure [6j:). The value of C can not be obtained directly 
from experimental measurements. Changing the value of C is equivalent to an 
x-translation of the repression due to hkb. We calibrated C to have the same 
value for the hb density as in the vivo data (figtHh- and d). We solve equation 
(|59p numerically and obtain an hb density represented in figure ^jp. This new 
theoretical density obtained is now close to the in vivo data (figure |6^), in 
particular we recover the sharp boundary of hb. The main differences between 
the theoretical and experimental densities are located at the posterior side of 
the hb boundary where we obtain a higher density than the vivo data and at 
the posterior edge where the density is lower. The difference at the posterior 
side of the hb boundary might be due to repression of hb by the knirps TF [17] 
which is not modelled here. As for the difference at the posterior edge, this can 
be due to activation of hb by the Caudal TF Tf. Taking into account these 
two regulation pathways should lead to a refined analysis of the hb density. 



5 Conclusion 

In this paper, we studied transcription activation by TFs starting from the 
stochastic nature of the search process for a DNA promoter site. We then 
applied our computations to estimate the sharp boundary induced by a smooth 
gradient of TFs. In the first part, we focused on the kinetics of the binding of 
TFs to their target sites located on the DNA molecule: when the average number 
of cycles of free diffusions and DNA bindings before finding the target sites is 
large, the search time T(nf, n^) is exponentially distributed and we estimate the 
mean (relation ([TO)) V Next, we considered the case of a cell receiving a steady 
state influx of TFs which can be enzymatically degraded. We modeled the 
dynamics of the TFs' binding and unbinding their target sites and we estimated 
the fraction of time that k out of Ug sites are occupied at steady state. For 
= 1 we obtain an explicit expression in equation ()19|) . For rig > 1, the general 
expression of P^ is given by an equation p2|) and an asymptotic development 
for /? ^ 1 is provided in expression P5)) . We presented the different occupation 
ratios in figure [2] and [3] for two and four sites respectively. We consider that the 
transcription rate is proportional to the fraction of time a given number of sites 
are occupied. For a defined TF concentration entering the nucleus, our model 
provides a quantitative input-output relation in terms of the transcription rate. 
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(a) In vivo concentration of bed and hb (b) Theoretical and experimental concentra- 
tions of hb 
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(c) Exponential fit of cJhfc6{a;) (d) aji(x) for different values of C 



Figure 6: (a) In vivo concentration of bed and hb TFs as a function 
of X. Figure reproduced from [TB]. (b) Theoretical and experimental hb 
concentration as a function of x. Here to compute ah{x) we take into 
account bed activation, hb autoregulation and hkb repression. The parameters 
B = lA and C = 1.1 are used to fit the in vivo data which is reproduced 
from [TB]. (c) In vivo hkb density as a function of x and exponential fit 
used in equation (|6ip . The in vivo data is from the Flex database [TTIBTIH^ . 
(Since the bed and hb densities from [TB] reproduced in figure ©a are for the 
beginning of the 14 A cycle of the development of Drosophilia [T^, we use the 
in vivo hkb densities for the first half of the 14 A cycle (Tl to T4) from the 
Flex database to fit ((6T|) .) (d) Hb density ah{x) for different values of C: 
for C=0.6 (red), 1.2 (green), 2.4 (blue). The value of C is adjusted to have the 
same value for the hb density at j: = as for the in vivo data. 
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When we apply our model to the regulation of hunchback by the bicoid 
morphogenic gradient, we focus on the sharp boundary in the hb density at the 
middle of the embryo. Several mechanisms accounting for the formation of sharp 
boundaries have been proposed: Some mechanisms [IHl ESI Ell ES] result from 
cooperative binding while others include a bistable gene [37] or the antagonistic 
action of a repressor and activator gradient jlSl EH ESI ES] • Here, we use neither 
the repression of hb in the posterior half nor the cooperative binding of bed, but 
we show that, in the absence of these two mechanisms, a smooth morphogenetic 
gradient can trigger a sharp boundary for an autoregulated gene. We also show 
that bistability of the autoregulated gene is not a requirement and that sharp 
boundaries can be generated by monostable autoregulated genes. We found 
the critical value for the transcription rate at which a bifurcation occurs and 
gave an estimate in equation (I102p . We further show that a bistable gene can 
produce a sharp boundary from a smooth gradient. Nevertheless, for a bistable 
hb, cells located on the posterior side of the embryo can switch from a low stable 
value to a high one in response to a sufficiently large perturbation. A repressor 
on the right hand side of the boundary would then be required to obtain a 
reliable boundary position. Our results show that an autoregulated gene close 
to bistability is sufficient to produce a sharp boundary. 

Here, we focused on a minimal mechanism that allows a morphogenetic 
gradient to trigger a sharp boundary in an autoregulated gene. In order to focus 
on this minimal system that produces sharp boundaries, neither hb-repression in 
the posterior half nor cooperative binding of bed are modeled. Both repression 
[46] and cooperative binding [l^ are already known to play a key role in the 
formation of the sharp boundary of hunchback and it would thus be interesting 
to expand our model to take them into account. With autoregulation, it would 
then be interesting to see how these three mechanisms, which appear to be 
redundant, produce sharp and robust boundaries in the embryo. 

6 Appendix 

6.1 The pdf of T(l,r2,) 

We compute here the pdf Prit) of the time T(l, Ug) a single TF takes to bind 
one of the Ug DNA specific targets. Decomposing the pdf by the event that the 
target is found after exactly k steps, we have; 



Using the probability p{ns) to bind to one of the Ug sites during a one dimen- 
sional motion along the DNA molecule, the probability Pk — Pr{k ID walk } 
to find a site during the fc*'* one dimensional DNA motion is given by: 



Prlt) = ^Pr{T{l,ns) < t\k ID walk}Pr{fc ID walk}. 



(62) 



fc=0 



Pk 



P{ns){l -p{ns)) 



(63) 
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A cycle is the concatenation of one and three dimensional motions. Both periods 
are characterized by random exponential times. The conditional search time for 
k cycles of DNA binding and free diffusion is then: 

k 

T{l,ns)\k = Y,irjU)+TdU)), (64) 
i=i 

where (Td(l), ..Td{k)) and (t/(1), ..,t/(2), ...Tf{k)) are respective the times spent 
bound to the DNA and freely diffusing in the nucleus. 

To compute Prit), we will use the characteristic function F of n^), 

oo 

F{x) ^Etie'*'') = f e''''pT{t)dt (65) 



Y,Gk{x)Pk, (66) 



fc=i 

where Gk is the characteristic function of {T{l,ns) < t\k ID walk). Since the 
random times Td{j) and Tf{j) are independent, the characteristic function of 
the sum ([M)) is the product of the characteristic functions: 

k 

Gk{x)^l[F,^(,){x)F,^^,^{x), (67) 
i=i 

where i<Vy:(j)(a;) and -FV^(j)(a;) are respectively the characteristic functions of the 
free diffusion time T/(j) and the time Td{j) bound to the DNA. Since these times 
are exponentially distributed: 

oo ^ 

F{x) = ^p(n,)(l-p(n,))'^-i— — — ^— -— ^ (70) 
k=i (1 - «2;r/) (1 - iXTd) 

_ P{ns) 



Finally, 



(1 - ixTd) (1 - ixTf) - 1 + p{ns) 



(71) 



The poles are given by the two roots of (1 — yr^) (1 — ?/t/) — 1 +p(n.,) = with 
v = ix: 



2TfTd 

_ {Td + Tf) + ^{Td + TfY - 'ip{ns)Tffd 

^2 — n— — ^ U, 

2T/Td 
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where, for p{ns) G [0; 1], the two roots ri and r2 are real positive. Decomposing 
the fraction ([TTj) gives: 

F{.) = __, '^""i. ^^"i . (73) 

TdTf [n - r2 ) [IX - n ) TdTf [n - r2 j (IX - r2 j 

(74) 



(ri - r2){ix - ri) {ri - r2){ix - r2)' 



where = ?'i^2 comes from the equation satisfied by ri and r2- By inverting 



oo 



the characteristic function prit) = ^Ex{e **^) = / e'*^F(x)(ix and since the 

— oo 

inverse transform of — ^J2r-^ exponential distribution of mean ^ , we obtain: 

M0 = ^^ + ^^. (75) 

r2 - ri ri ri - r2 r2 

We conclude that the distribution pT is the sum of two decreasing exponentials. 

6.2 Asymptotic pdf of T(1,?t,s) for p{ns) ^ 1 
We shall now study the approximation p(ns) <C 1, for which: 

ri « (76) 

r2 « ^ + ^, (77) 

and 

^^^-^m +e(^ + ^) , (78) 

Td + Tf \Td TfJ 

with e = p{ns)— — ^ — j s- < Efcsl, Since p{ng) <C 1 the second exponential 

converges faster to than the first and is further multiplied by a small coefficient 
e. 

For a time (^=2 + t ^ 1, we approximate the pdf pt given in equation 
(|75p by a single exponential: 

p^[t) - ^ltfLe-^\ (79) 

Td + Tf 

Since Td and r/ are both on the order of a few ms [10l[2T], the single exponential 
limit is valid for t larger than a few ms. The mean time T(l,ns) then reduces 
to: 

T(l,n.)«I^^. (80) 
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6.3 Computation of for /3 < 1 

Combining equations (1^ and the first order approximations in /3, the 

probability P„^ that all sites are simultaneously occupied is: 



oo 



" T (l- (81) 



Using the partial sum: 



S(^) = E If' (82) 

fc=0 



and the relations: 



E S = (S3) 

rif—ris 

± , ^-^^ = /^l^d., (84) 



we obtain: 



a 

P„^ = l-e-"S(a)- (inse-'^a''^-^ j ^-—^^^dx. (85) 



Using the change of variable a; = au, we can write: 

1 

(a) = 1 - e-"5(a) - /3n,e'" / ~ ^^""^ dw. (86) 



We shall now examine some properties of . For au > 0, e"" — S{au) > , 
thus P„^ is a decreasing function of (3. Indeed the partial derivative of P„^ in 
P is negative. Moreover, P„^ is an increasing function of a for /3 ^ 1: starting 
from expressions (|5T|) (which is equal to and differentiating with respect 

to a: 



e 



da ^-^ \ Uf — Us + 1 J \{nf — ly. nf\ 



Using Uf > Us and e sufficiently small, then for /3 < ^(1 — e), [l — +i ) ^ 

e and we obtain: 



> £6""- — > 0. (88) 



da [us — 1)! 
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and 9Pns is an increasing function of a. 

We now proceed with estimating P„^_i. Using equations ([T6| and ((20)) . we 
obtain: 

P„,_i - e-" ^ F(fc = n, (89) 

n/— ns — 1 

For /3 -C 1, using approximation for the term in — 1 and (|57|) for the 
other terms: 



Using relation ([55)1 . for ^ 1, we obtain: 



1 





Finally, when k < Us — 2 sites are occupied, using the first order approxima- 
tions for F{k\nf) in formula ([39]), we shall only retain the probabilities associated 
with A: or fc + 1 TFs in the nucleus, 

Pfc w P(fc|fc)P(fc)+P(fc|fc + l)P(fc + l) 



(n, -fc + l)y /c! (n^-A:) (fc + 1)! 



6.4 Critical value for bistability 

To compute the critical value Rc for which the profile ah (x) can be bistable, we 
use equation: 

a;, = i?(l-(l-P(x))(l-/K))). (93) 
For the critical value Rc, the function: 

a^Rc{l-{l-P{x)){l-f{a))), (94) 

is tangent to a a in etc for some value of P{x), where ac is the point where 
/ changes concavity (figure [5]). For k — Ug > I and (3 — 0, 

f{a) = P„., = 1 - e-"S{a). (95) 

/"(ac) is equivalent to S'(ac) - 25'(ac) + S"{ac) = 0, where: 

n s — 1 

S{a) = E ^- (96) 

fc=0 
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After some computations, we find that: 



ac = ns- 1. (97) 

Now at the critical value Rc, the function ([M]) is tangent to a in ac (see figure 
[S]) and we obtain the conditions: 

i?,(l-(l-P(x))(l -/(«,)))= a, (98) 

Rc (i - (1 - Pi^)) (i - f^K))) = 1- (99) 



After simplification, 

1 - /(«c) 

We then obtain for Ry. 



1 - „ g}) "[ = ajRc- (100) 



-Rc = ac + -^T- — — = ac + -F7 — ^ qTi — T- 101) 

£(ac) S'(ac) -S"(ac) 



Finally, using ([57)1 we obtain: 



S{ns - 1) 
■ (n. - 1)^ 



i?, = n,-l+(n.-l)! ,;^^^^ „ i^ . (102) 



and for Ug ~ 2, R^. = 3. 
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